This is  Supplementary Materials for < De-biased Court's View Generation with Causality>, paper id = 2012.

### ./Code

- To process the data, run data-preprocess1.ipynb and data-preprocess2.ipynb
- The hyperparameters are shown below:

| Name                | value | Note                                                         |
| ------------------- | ----- | ------------------------------------------------------------ |
| hidden_dim          | 256   | dimension of RNN hidden states                               |
| emb_dim             | 300   | dimension of word embeddings                                 |
| batch_size          | 16    | minibatch size                                               |
| max_enc_steps       | 300   | max timesteps of encoder (max source text tokens)            |
| max_dec_steps       | 150   | max timesteps of decoder (max summary tokens)                |
| beam_size           | 4     | beam size for beam search decoding                           |
| min_dec_steps       | 35    | Minimum sequence length of generated summary. Applies only for beam search decoding mode |
| vocab_size          | 50000 | Size of vocabulary. These will be read from the vocabulary file in order. If the vocabulary file contains fewer words than this number, or if this number is set to 0, will take all words in the vocabulary file. |
| lr                  | 0.15  | learning rate                                                |
| keep_prob           | 0.5   | keep prob                                                    |
| adagrad_init_acc    | 0.1   | 'initial accumulator value for Adagrad                       |
| rand_unif_init_mag  | 0.02  | magnitude for lstm cells random uniform inititalization      |
| trunc_norm_init_std | 0.1   | std of trunc norm init, used for initializing everything else |
| max_grad_norm       | 2.0   | for gradient clipping                                        |

- The training/eval/decode step is the same as https://github.com/becxer/pointer-generator
- To evaluate the result, first run  rouge_id_test.py , then run metric.py and bertscore.py
- We will add details later.

### ./Data

- There are four samples to show the data structural.
- We will release the whole data.

### ./Showcase

- some examples to show the generated result.

### Others

- All models are trained on 2 V100 GPU (16GB each).

- Some training details:

| Method      | Avg Runtime(h) | # of Paras. |
| ----------- | -------------- | ----------- |
| S2S(wS)     | 22             | 30,789,836  |
| PGN(wS)     | 25             | 30,791,161  |
| AC-NLGw/oD  | 7              | 19,972,418  |
| AC-NLGw/oBA | 28             | 34,622,843  |
| AC-NLGw/oCA | 27             | 45,244,852  |
| AC-NLG(wS)  | 29             | 49,010,612  |

